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Abstract 


Leveraging data collected from smart meters in buildings can aid in developing policies towards 
energy conservation. Significant energy savings could be realised if deviations in the building 
operating conditions are detected early, and appropriate measures are taken. Tovvards this end, 
machine learning techniques can be used to automate the discovery of these abnormal patterns in 
the collected data. Current methods in anomaly detection rely on an underlying model to capture 
the usual or acceptable operating behaviour. In this paper, we propose a novel attention mech- 
anism to model the consumption behaviour of a building and demonstrate the effectiveness of 
the model in capturing the relations using sample case studies. A real-world dataset is modelled 
using the proposed architecture, and the results are presented. A visualisation approach towards 
understanding the relations captured by the model is also presented. 
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1. Introduction 


As per the recent global status report by UNEP[1], buildings’ energy demand increased by 
about 4% from 2020. Operation of buildings accounted for 30%[2] of global energy consump- 
tion with nearly 20%[3] for air-conditioning in buildings and is likely to increase due to in- 
creased consumer demand and living standards. Thus reducing operating power consumption 
without compromising living standards is of utmost importance to achieve ambitious goals to- 
wards a sustainable future. Around one-third of this consumption can be attributed to negligent 
behaviour of consumers, e.g., opening windows with AC turned on or using wrong settings on 
the ACs[4]. Automated discovery of such instances has immense potential to provide significant 
energy savings. 

Anomalies are defined as instances that are relatively rare to occur. The general approach to 
detect these instances involve modelling the normal behaviour. Events that are not captured by 
the model will be termed anomalies. Data collected from modern buildings is usually a multivari- 
ate time series with multiple features. Manually developing models for such systems becomes 
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cumbersome and is not scalable. Machine learning techniques have demonstrated their ability to 
capture complex non-linear patterns from the data. These techniques promise to be good candi- 
dates for automating the modelling process. A comprehensive review of these techniques applied 
to building energy anomaly detection is presented in [5]. 

Most researchers resorted to developing unsupervised techniques for anomaly detection due 
to a lack of labelled data [5]. Among these techniques, the recent works include a combination 
of variational autoencoder[6] and LSTM[7]. Though the techniques produce results superior to 
the conventional autoregression methods [8], the RNN architecture processes the time frames 
sequentially, which could potentially increase training times for large-scale multivariate data. 
Transformer architecture[9] alleviates this problem with the attention mechanism. It demon- 
strated the state of the art results in the field of machine translation. It has been used in [10] 
for anomaly detection and demonstrated to be superior to other deep learning techniques. Since 
the transformer architecture is designed for language processing tasks, it was designed to cap- 
ture temporal dependencies of various scales. In addition to the temporal dependencies, building 
energy data has complex interrelationships across the features that need to be considered during 
the energy consumption modelling. We propose a novel attention mechanism to capture these 
relationships in a multivariate time series and test the model on synthetically generated and real- 
world case studies for power consumption in buildings. Specifically, we train our model on the 
windows generated from the data. The trained model is used to reconstruct the window for test 
data. The difference between the given window and reconstructed window is labelled as anomaly. 
In the present work, the model performance in capturing the relations is qualitatively established 
using attention maps. 

In summary, the main contributions are as follows: 


1. We propose a novel algorithm for anomaly detection in multivariate time series termed as 
TiFeAuto that can model the complex interrelations among the features across the time 
windows. 

2. In addition to identifying the anomalous points, we propose to use the attention maps and 
understand the internal working of the model during window reconstruction, thereby in- 
creasing the interpretability of the model. 

3. The model’s ability to capture relationships in an unsupervised fashion has been tested on 
synthetically generated data and a real-world dataset. 


2. Methodology 


2.1. Problem Formulation 


Consider a multivariate time series $ with N features. S is sampled along time to generate 
windows of length T. A matrix denotes each window X, of dimension T. Given M such win- 
dows, the objective is to develop a model to capture the time-varying relations among the features 
and reconstruct the given windows. During the training phase, the model extracts relevant fea- 
tures from the data to allow for maximum reconstruction. The model reconstructs the window 
with the learnt feature relationships during the testing phase. Reconstruction loss is calculated, 
and a threshold is set to identify points that behave differently than others based on the relations 
it learnt. In the current work, we propose a combination of novel attention mechanism (referred 
to as TiFe attention) and an encoder-decoder architecture to reconstruct the samples. Here the 
TiFe attention serves two purposes: 
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e Increase the robustness of the encoder-decoder architecture to outliers in the training data. 
The TiFe attention vveighs the input features before feeding to encoder-decoder architecture 
in a vvay that points not conforming to the relations developed so far during training are 
given less vveightage. 


e Provides a human-interpretable way to understand the relationships captured by the model. 


2.2. TiFe Attention Model 


Since their introduction, LSTM[7] architectures have become a go-to model for time series 
data. LSTM, being an RNN, is sequential when operating on time windows, leading to signif- 
icantly longer training times. Also, the hidden state needs sufficient latent space to capture the 
information of all the previous states. This problem is alleviated using a transformer in [9] with 
an attention mechanism where the model can see information across all the windows and iden- 
tify any potential relations across the time dimension. This approach allows for parallelisation 
leading to reduced training time. Like the LSTM model, the transformer architecture constrains 
the model from capturing all the previous state information in the current state with the same 
feature dimension. To allow the model to create connections across features of different time 
steps without losing much information, we propose the below mappings: 


e f: RTN > Rd _ To capture relationships across time dimension 
e g: RMT > RN _ To capture relationships across features 


e h: RN xda _, RTN _ To utilise the developed mappings above and create a representa- 
tion of the feature across various time scales 


where d, is the latent space dimension of the TiFe Attention (a hyperparameter of the TiFe 
Attention model). Each of these transformations is parameterised with fully connected neural 
network layers. In addition to reduced training time with parallelisation, the model allows for 
better interpretation with the help of attention maps. 

Extending from the scaled dot product attention introduced in [9], we define an attention (re- 
ferred to as TiFe Attention) mechanism that captures relationships across time and feature di- 
mensions as shown in Figurel. 

The input windows (X,) are fed to the TiFe attention layers, and the corresponding attention 
matrices A,, Ar along the time and feature axis are determined by matrix multiplication. The 
attention matrices are scaled[9], and a softmax function is applied over the matrices to determine 
the weights of each of the input values along the time/feature dimension. These weights indicate 
the degree of conformance of the observed data point with the relationships learnt. These weight 
matrices are multiplied with the original input to obtain reinforced vectors which are fed to a 
neural network( fo) that serves as a feature extractor for the auto-encoder described in the Sec2.3 

The outputs of the TiFe Attention Model can be treated as weighted input vectors where a 
higher value indicates a better conformance of the observed values with the relationships learnt 
by the model. 


2.3. Encoder-Decoder Model 
The model has an encoder E : BIN > RT? and decoder E : RT! > RTN where | 
is the latent dimension of the model. The encoder attempts to construct a latent space with 


the given inputs, and the decoder reconstructs from the latent space. Considering the model 
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Figure 1: A schematic diagram of TiFeAttention Model Architecture where XT is the transpose of matrix X and X is Kat 
or X, f depending on whether time or feature attention is required 


being trained with usual behaviour data, if the model is fed with an abnormal data point, the 
reconstruction error would be sufficiently larger. The loss associated with this reconstruction 
serves as a metric to detect anomalies. Usually, a threshold is set beyond which the points are 
considered as anomalies. In the current work, we present only a qualitative way f understanding 
the reconstruction loss with the help of attention maps. 

The overall architecture is shown in Figure2 


2.4. Dataset Preparation 


The paper is organised around three datasets. Two of them are synthetically generated to 
demonstrate the application of the TiFeAttention model. The last one is a real-world dataset. 


2.4.1. Data-I: Synthetic data for investigating the contributions of TiFe Attention model to the 
autoencoder 

The following data have been generated to assess the contribution of the TiFeAttention model 

to the autoencoder architecture. A typical weekly profile alone is shown in Figure3. Artificially 


introduced anomalies for one week with spikes to resemble anomalies, are shown in Figure4. 
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Figure 2: Pictorial representation of the TiFe Attention Model vvith Encoder-Decoder architecture 


2.4.2. Data-II: Synthetic data for investigating the interpretability of the model using time and 
feature attention maps 

For illustration purposes, the following data is generated. The test aims to study and under- 
stand how the model learnt feature relationships across the variables. It also helps understand 
how attention maps can be used to tune the model based on consumer requirements. 

Consider a room with two air-conditioning units. In order to meet the room heating load, one 
of the air-conditioners is sufficient, and from last year's data, it was observed that both operate 
in a biweekly fashion, as shown in Fig.5. A synthetic anomaly where the operation of 2 air- 
conditioning units is swapped for one whole day is shown in Figure.6. This data demonstrates 
how attention maps can be used to understand the relations captured by model in Section3. 


2.4.3. Data-III: Real world data of an academic office building 

The proposed model is tested using a real-world dataset in [11]. It has data from over 55 
air-conditioning units in an academic office in Thailand sampled every minute. Apart from air- 
conditioning loads, there are lighting and plug loads. For the current study, only air-conditioning 
loads shown in Tablel are considered. These loads are strategically chosen for the following 
reasons: 


e The units belonging to different zones follow different periodic cycles as observed so the 
model is expected to capture these dependencies feature-wise. 


e The power consumption of units in the same zone are highly correlated with each other 
while across different zones are not related in general. The units in the same zone tend to 
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Figure 3: Data-I: Typical hourly power consumption for 1 week 
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Figure 4: Data-I: Anomalous weekly profile with abnormally high/low value 
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Figure 6: Data-II: Hourly power consumption profile with features interchanged as described in Section2.4.2 


have a load sharing pattern which can only be reconstructed if the model learns to capture 
the relations among load patterns of units in the same zone at various times. This can be 


verified by with the attention maps. 
The missing values are imputed with the mean values and then scaled using Eqn.1. 


Xi — Xi,min 
x= —— (1) 
Xi,max — Xi,min 


where i = 1,2, ..., N and x; in.Ximax are the minimum and maximum values of feature x;. 


Floor | Zone | Number of units 
1 2 4 
2 1 1 
2 2 3 


Table 1: Data considered for model training 


A snap of 3 weeks of hourly data for all the 8 AC units is shown in Figure7 


2.5. Model training 
Hourly rate samples are used throughout the study. Adam optimizer vvith default parameters is 
used in tensorflow for all the models. The hyperparameters of the models are tabulated in Table2 


3. Results and Discussions 


3.1. Data-I: Investigating the contributions of TiFe Attention model to the autoencoder 


To demonstrate the contributions of TiFe Attention model, data as described in Section2.4.1. 
The model is trained on the data with anomalies using the proposed model and an auto encoder 
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Table 2: Model Parameters 


Parameter Data-I | Data-II | Data-Ill 
da 64 64 32 
l 64 32 16 
batch size 128 256 128 
window length 168 336 168 
number of features 1 2 8 


without the TiFe Attention model but of same latent size. A comparison of the reconstruction 
profiles in windows of usual behaviour is shown in Figure9. Both the models were able to 
capture the original distribution. A comparison of the windows with abnormal patterns is shown 
in Figure10. Though the autoencoder alone is able to reconstruct the observed window, it lacked 
contextual information to constrain it from reconstructing the peak observed on early morning 
of Sunday in Figure10b. TiFe Attention model aided in providing the contextual information by 
reducing the weight of the sample for reconstruction as visualised in the attention map shown in 
Figure8. It can be observed in Figure8b, the bands are lightened in regions with unusual spikes 
indicating the presence of anomalies reinforcing that TiFe Attention model provided contextual 
information to the encoders. 


3.2. Data-Il: Investigating the interpretability of the model using time and feature attention maps 


The model is trained on Data-Il (described in Section2.4.2 and the obtained attention maps are 
shown in Figure11. As seen in Figure6 vs Figure5, the only difference exists on Monday when 
AC2 operates instead of ACI. The unusually low value for AC1 triggered the TiFe Attention 
model to increase the weights for the observed values for reconstruction as seen in Figurellb 
where a patch with higher vveights are found for the first week. Also both normal and anomalous 
feature attention maps are similar vvhich indicate that the values being observed do not indicate 
any abnormality but only the sequence in which they appear is an anomaly. These results signify 
the importance of attention maps in understanding vvhether relationships captured and used by 
the model for reconstructing the sequence are valid. They also help in understanding if the 
reconstruction error is due to under fitting of the model or an underlying anomaly. 


3.3. Data-III: Investigating model performance on a real world data 


The proposed model is trained on the dataset and reconstruction profile for 3 weeks is shown 
in Figure13. Attention maps are first analysed and key observations are noted below: 


1. Figure12b shovvs that the model gives more vveightage to the values of same feature across 
the time for most of the features except f2 z2 AC 3. For this particular unit, weightages are 
given to 2 features across the time steps. This can be confirmed from the profiles shown in 
Figure7g and Figure7h where the profiles are closely related since they belong to the same 
zone. So if one of them is on/off then the reconstruction happens in a way that both or 
on/off. 

2. Figurel2a shows that model understood 2 different kinds of periodicity in the data (Monday- 
Saturday, Tuesday-Friday) as seen in the different weights for the horizontal bands. This can 
be confirmed from the profiles shown in Figure7a (Tuesday-Friday) and Figure7e (Monday- 
Saturday) 
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Now, we demonstrate how the above observations help in not only identifying an anomaly 
but also establishing qualitative differences between the identified anomalies and understand the 
reason for the model reconstruction based on the relationships it learnt. 


e In Figurel3a, it can be observed that a spike on first wednesday is not reconstructed. From 
Observation2, this particular region belongs to the set with Tuesday-Friday period and the 
model accordingly reduced the vveight for the spike to prevent it”s reconstruction. 


e In Figurel3g, A peak is reconstructed by the model on first Tuesday. This region corre- 
sponds to the one vvith period of one day based on the observations from previous days, the 
model predicted a high povver consumption. Also note that the same has been reconstructed 
in Figure 13h since both are related (Observation1) 


4. Conclusion 


Judicious consumption of energy during building operations can provide significant value to- 
vvards achieving energy conservation steps. Automated discovery of anomalous consumption 
patterns can help in developing policies directed to minimise negligent consumer behaviours. 
The main idea behind the current anomaly detection revolves around identifying normal con- 
sumption patterns and raising flags in case of anomalies. Several machine learning techniques 
are reported to model the normal consumption pattern. The current work proposes a novel atten- 
tion mechanism to capture the normal consumption behaviour. Sample case-studies demonstrate 
that the proposed architecture captures these patterns and attention maps thus generated can be 
used to tune the model parameters without overfitting incase of datasets where training data 
included both normal and anomalous behaviours (refer Section3.1. The proposed architecture 
helps not only in identifying the anomalies but also provides a way to qualitatively classify them 
using attention maps. 
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(a) Data-III: Typical hourly consumption profile for AC in Floorl, Zone 2, AC 1 
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(b) Data-III: Typical hourly consumption profile for AC in Floorl, Zone 2, AC 2 


fl z2 AC3(kW) 


F Kepunjes 


+ epu 


T AepsınuL 


+ Aepsaupam 


F Aepsant 


+ Aepuow 


+ Aepuns 


+ Áepunjes 


Aepus 


F Aepsunuy 


F AepsəupəM 


F AepsanL 


+ Aepuow 


F Aepuns 


F Aepunjes 


F Aepuy 


+ Aepsuny 


F Aepsaupam 


+ Aepsant 


+ Aepuow 


+ Aepuns 


0.0055 + 


0.0050 + 


0.0045 + 


0.0040 + 


0.0035 + 


0.0030 + 


0.0025 + 


0.0020 + 


(c) Data-III: Typical hourly consumption profile for AC in Floorl, Zone 2, AC 3 


Figure 7: Data-III 
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(d) Data-III: Typical hourly consumption profile for AC in Floorl, Zone 2, AC 4 
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(e) Data-III: Typical hourly consumption profile for AC in Floor2, Zone 1, AC 1 
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(f) Data-III: Typical hourly consumption profile for AC in Floor2, Zone 2, AC 1 


Figure 7: Data-III (contd.) 
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(a) Attention map for usual pattern 


(b) Attention map for window with anomalous spike 


Figure 8: Data-I: Comparison of attention maps for normal and anomalous patterns 
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(b) Normal reconstruction profile without TiFe Attention 


Figure 9: Data-I: Comparison of reconstruction profiles with and without TiFe attention 
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(a) Normal reconstruction profile with TiFe Attention 
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(b) Normal reconstruction profile without TiFe Attention 


Figure 10: Data-I: Comparison of reconstruction profiles with and without TiFe attention 
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(a) Attention map for normal usage 
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(b) Attention map for anomalous usage 


Figure 11: Data-II: Comparison of attention maps 
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(a) Encoder Time Attention Map 
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(b) Encoder Feature Attention Map 


Figure 12: Data-III: Attention Maps from TiFe Attention model 
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(a) Data-III: Reconstruction vs Original hourly power consumption of Floor 1, Zone 2, AC 1 
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(b) Data-III: Reconstruction vs Original hourly power consumption of Floor 1, Zone 2, AC 2 
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(c) Data-III: Reconstruction vs Original hourly power consumption of Floor 1, Zone 2, AC 3 
Figure 13: Data-HI 
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(d) Data-III: Reconstruction vs Original hourly power consumption of Floor 1, Zone 2, AC 4 
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(e) Reconstruction vs Original hourly power consumption of Floor 2, Zone 1, AC 1 
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(f) Reconstruction vs Original hourly power consumption of Floor 2, Zone 2, AC 1 


Figure 13: Data-III (contd.) 
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(g) Reconstruction vs Original hourly power consumption of Floor 2, Zone 2, AC 2 
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(h) Reconstruction vs Original hourly power consumption of Floor 2, Zone 2, AC 3 


Figure 13: Data-III (contd.) 
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